In this tutorial notebook we'll do the following things:
First we need to import some libraries
In [249]:
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.stats
In [250]:
TRUE_MEAN = 40
TRUE_STD = 10
X = numpy.random.normal(TRUE_MEAN, TRUE_STD, 1000)
Now we'll define functions that given our data, will compute the MLE for the $\mu$ and $\sigma$ parameters of the normal distribution.
Recall that
$$\hat\mu = \frac{1}{T}\sum_{t=1}^{T} x_t$$$$\hat\sigma = \sqrt{\frac{1}{T}\sum_{t=1}^{T}{(x_t - \hat\mu)^2}}$$
In [251]:
def normal_mu_MLE(X):
# Get the number of observations
T = len(data)
# Sum the observations
s = sum(X)
return 1.0/T * s
def normal_sigma_MLE(X):
T = len(X)
# Get the mu MLE
mu = normal_mu_mle(X)
# Sum the square of the differences
s = sum( math.pow((X - mu), 2) )
# Compute sigma^2
sigma_squared = 1.0/T * s
return math.sqrt(sigma_squared)
Now let's try our functions out on our sample data and see how they compare to the built-in np.mean
and np.std
In [252]:
print "Mean Estimation"
print normal_mu_mle(X)
print np.mean(X)
print "Standard Deviation Estimation"
print normal_sigma_mle(X)
print np.std(X)
Now let's estimate both parameters at once with scipy's built in fit()
function.
In [253]:
mu, std = scipy.stats.norm.fit(X)
print "mu estimate: " + str(mu)
print "std estimate: " + str(std)
Now let's plot the distribution PDF along with the data to see how well it fits. We can do that by accessing the pdf provided in scipy.stats.norm.pdf
.
In [254]:
pdf = scipy.stats.norm.pdf
# We would like to plot our data along an x-axis ranging from 0-80 with 80 intervals
# (increments of 1)
x = np.linspace(0, 80, 80)
h = plt.hist(X, bins=x, normed='true')
l = plt.plot(pdf(x, loc=mu, scale=std))
In [255]:
TRUE_LAMBDA = 5
X = np.random.exponential(TRUE_LAMBDA, 1000)
numpy
defines the exponential distribution as
$$\frac{1}{\lambda}e^{-\frac{x}{\lambda}}$$
So we need to invert the MLE from the lecture notes. There it is
$$\hat\lambda = \frac{T}{\sum_{t=1}^{T} x_t}$$Here it's just the reciprocal, so
$$\hat\lambda = \frac{\sum_{t=1}^{T} x_t}{T}$$
In [256]:
def exp_lamda_MLE(X):
T = len(X)
s = sum(X)
return s/T
In [257]:
print "lambda estimate: " + str(exp_lamda_MLE(X))
In [258]:
# The scipy version of the exponential distribution has a location parameter
# that can skew the distribution. We ignore this by fixing the location
# parameter to 0 with floc=0
_, l = scipy.stats.expon.fit(X, floc=0)
In [259]:
pdf = scipy.stats.expon.pdf
x = range(0, 80)
h = plt.hist(X, bins=x, normed='true')
l = plt.plot(pdf(x, scale=l))
In [260]:
prices = get_pricing('TSLA', fields='price', start_date='2014-01-01', end_date='2015-01-01')
# This will give us the number of dollars returned each day
absolute_returns = np.diff(prices)
# This will give us the percentage return over the last day's value
# the [:-1] notation gives us all but the last item in the array
# We do this because there are no returns on the final price in the array.
returns = absolute_returns/prices[:-1]
Let's use scipy
's fit function to get the $\mu$ and $\sigma$ MLEs.
In [261]:
mu, std = scipy.stats.norm.fit(returns)
pdf = scipy.stats.norm.pdf
x = np.linspace(-1,1, num=100)
h = plt.hist(returns, bins=x, normed='true')
l = plt.plot(x, pdf(x, loc=mu, scale=std))